There is the data: atp_tennis containing the following columns: ['Tournament', 'Date', 'Series', 'Court', 'Surface', 'Round', 'Best of', 'Player_1', 'Player_2', 'Winner', 'Rank_1', 'Rank_2', 'Pts_1', 'Pts_2', 'Odd_1', 'Odd_2', 'score'].  
--- The description for each column this data is:
Tournament: Name of the tennis tournament (Brisbane International, Chennai Open, Qatar Exxon Mobil Open ...etc)
Date: Date the match was played (year-month-day)
Series: Category or level of the tennis tournament (ATP250, ATP500, Masters1000 and Grand Slams offer 250, 500, 1000, and 2000 ranking points to the winner seperately.)
Court: Place the match was held (Indoors or Outdoors)
Surface: Type of court surface (Hard, Grass and Clay)
Round: Stage of the tournament (1st Round, 2nd Round, Quarterfinals, Semifinal and The Final)
Best of: Tourament systems ("best of 3" or "best of 5")
Player_1: Name of the first competitor in each match
Player_2: Name of the second competitor in each match
Winner: Name of the Player_1 or Player_2 who won the match 
Rank_1: World rankings of the Player_1 at the time of the match
Rank_2: World rankings of the Player_2 at the time of the match
Pts_1: Ranking points held by Player_1 before the match (accumulated over the season and higher points typically indicate a more successful player)
Pts_2: Ranking points held by Player_2 before the match
Odd_1: Betting odds for Player_1 (reflecting their perceived chances of winning)
Odd_2: Betting odds for Player_2 
Score: Final score of the match ("score of Player_1" - "score of Player_2")
---
Considering contents from the dataset and requirements from user. Please note DO NOT CHANGE FILE AND VARIABLE NAMES THAT I HAVE SET!
Interactions begin:
[USER (data scientist)]: User: Hi, let's start by assessing the data quality of the atp_tennis dataset. To achieve this, we'll create a Data Quality Report DataFrame that covers two main aspects: a list of columns along with the number of missing values in each column, A list of unique values for each categorical column, such as Tournament, Series, Court, Surface, and Round.

[YOU (AI assistant)]: Sure, let's create a Data Quality Report Dataframe for the atp_tennis dataset. Here's the code to generate the report :
'''
import pandas as pd
import pickle
from decision_company import read_csv_file, sum_up, is_null, n_unique, create_dataframe

atp_tennis = read_csv_file('atp_tennis.csv')

# Data Quality Report
missing_values = sum_up(is_null(atp_tennis))
unique_values = n_unique(atp_tennis, ['Tournament', 'Series', 'Court', 'Surface', 'Round'])

data_quality_report = create_dataframe({'Missing Values': missing_values, 'Unique Values': unique_values})

print(data_quality_report)

'''

[USER (data scientist)]: Great, We want to analyze the key statistics and distributions of player rankings, win/loss ratios, and surface types in the atp_tennis dataset across different tournaments and time periods. To do this, we'll create a Descriptive Statistics Table and Visualizations, including histograms, bar charts, and box plots.

[YOU (AI assistant)]: Sure, let's create a Descriptive Statistics Table and Visualizations for the atp_tennis dataset. Here's the code to generate the table and visualizations :
'''
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import pickle
from decision_company import read_csv_file, count_unique_values, transform, generate_summary_stat, concatenate_objects, create_dataframe, generate_summary_stat, create_subplots, create_histogram_subplot, set_plot_split_title, make_xlabel, save_plot, create_scatter_plot, make_ylabel, create_bar_chart, show_plots

atp_tennis = read_csv_file('atp_tennis.csv')

# Count the number of wins for each player:
player_wins = count_unique_values(atp_tennis['Winner'])

# Count the total number of matches played by each player:
player_matches = count_unique_values(atp_tennis['Player_1']) + count_unique_values(atp_tennis['Player_2'])

# Calculate win/loss ratio for each player:
win_loss_ratios = player_wins / (player_matches - player_wins)

# Add win/loss ratios to the dataset:
atp_tennis['Win_Loss_Ratio_1'] = transform(atp_tennis['Player_1'], win_loss_ratios)
atp_tennis['Win_Loss_Ratio_2'] = transform(atp_tennis['Player_2'], win_loss_ratios)

# Descriptive Statistics Table:
desc_stats = generate_summary_stat(atp_tennis[['Rank_1', 'Rank_2', 'Win_Loss_Ratio_1', 'Win_Loss_Ratio_2']])

# Combine Rank_1, Rank_2, Win_Loss_Ratio_1, and Win_Loss_Ratio_2 into single columns:
combined_ranks = concatenate_objects(atp_tennis['Rank_1'], atp_tennis['Rank_2'])
combined_win_loss_ratios = concatenate_objects(atp_tennis['Win_Loss_Ratio_1'], atp_tennis['Win_Loss_Ratio_2'])

# Create a DataFrame with combined ranks and win/loss ratios:
combined_data = create_dataframe({'Rank': combined_ranks, 'Win_Loss_Ratio': combined_win_loss_ratios})

# Descriptive Statistics Table:
desc_stats = generate_summary_stat(combined_data)

# Histogram for Player Rankings Distribution:
fig1, ax1 = create_subplots()
create_histogram_subplot(ax1, combined_ranks, bins=50, alpha=0.5)
set_plot_split_title(ax1, 'Player Rankings Distribution')
make_xlabel(ax1, 'Ranking')
save_plot('./ref_result/hist_chart.png')
show_plots()

# Scatter Plot for Player Rankings vs Win/Loss Ratios:
fig2, ax2 = create_subplots()
create_scatter_plot(ax2, combined_data['Rank'], combined_data['Win_Loss_Ratio'], alpha=0.5)
set_plot_split_title(ax2, 'Player Rankings vs Win/Loss Ratios')
make_xlabel(ax2, 'Ranking')
make_ylabel(ax2, 'Win/Loss Ratio')
save_plot('./ref_result/scatter_chart.png')
show_plots()

# Bar Chart for Surface Types:
fig3, ax3 = create_subplots()
surface_counts = count_unique_values(atp_tennis['Surface'])
create_bar_chart(ax3, surface_counts)
set_plot_split_title(ax3, 'Matches by Surface Type')
make_xlabel(ax3, 'Surface')
make_ylabel(ax3, 'Number of Matches')
save_plot('./ref_result/bar_chart.png')
show_plots()

print(desc_stats)

'''

[USER (data scientist)]: Thank you, Tapilot. Now let's move on. We want to determine if there is a significant correlation between surface types (hard, clay, grass) and player performance (win/loss ratios, player rankings) in the atp_tennis dataset. To do this, we'll create a Correlation Matrix and perform Statistical Tests to obtain p-values and confidence intervals.

My template of code snippet is:

---BEGIN CODE TEMPLATE---

import numpy as np
import pandas as pd
import scipy.stats as stats
import pickle
from decision_company import read_csv_file,

# please import the necessary private functions from decision_company first

atp_tennis = read_csv_file('atp_tennis.csv')

# YOUR SOLUTION BEGIN:
<code1>
[COMPLETE YOUR CODE] 
</code1>
# YOUR SOLUTION END

print("Correlation Matrix:\n", corr_matrix)

# save data
pickle.dump(corr_matrix,open("./pred_result/corr_matrix.pkl","wb"))
print("P-value:", anova_result_clean.pvalue)

# save data
pickle.dump(anova_result_clean.pvalue,open("./pred_result/pvalue.pkl","wb"))
print("Confidence Intervals:\n", t_critical)

# save data
pickle.dump(t_critical,open("./pred_result/t_critical.pkl","wb"))
print("Hard SurnConfidenceface:", hard_ci)

# save data
pickle.dump(hard_ci,open("./pred_result/hard_ci.pkl","wb"))
print("Clay Surface:", clay_ci)

# save data
pickle.dump(clay_ci,open("./pred_result/clay_ci.pkl","wb"))
print("Grass Surface:", grass_ci)

# save data
pickle.dump(grass_ci,open("./pred_result/grass_ci.pkl","wb"))

---END CODE TEMPLATE---

Please note that you have to generate the WHOLE python code instead of code segments based on the code snippet using Pandas library 2.0.3 version and Matplotlib library 3.7.4 version. You must keep all comments in code snippet unchanged.

[YOU (AI assistant)]: Sure, let's create a Correlation Matrix and perform Statistical Tests for the atp_tennis dataset. Here's the code to generate the matrix and perform the tests:
'''
import numpy as np
import pandas as pd
import scipy.stats as stats
import pickle
from decision_company import read_csv_file,

# please import the necessary private functions from decision_company first

atp_tennis = read_csv_file('atp_tennis.csv')

# YOUR SOLUTION BEGIN:
